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Description 

[0001] The invention relates to a classification method and apparatus, particularly to methods and apparatuses for 
classifying optically acquired character images. 

s [0002] In the field of package shipping, packages are routed from origins to destinations throughout the world ac- 
. cording to destination addresses typed on shipping labels applied to these packages. In order to route packages, it is 
desirable to use automated optical character classification systems that can read those addresses. Such a classification 
system must be able to classify characters as quickly as possible. Conventional optical character classificatidn systems 
using spherical neurons, such as those disclosed in U.S. Patent No. 4,326,259 (Cooper et a!.), may be unable to 

10 execute the processing requirements presented by certain applications without a substantial investment in hardware. 
[0003] The paper "Authentification dynamique de signatures par reseaux de neurones" in "Traitement du Signal", 
8/1991 , pages 423 and following describes a signature authentication method based on hand-writing. Proposed is a 
method in which after a signal normalization dissimilarity measures between signatures are extracted. These measures 
are then processed by a neural network. 

is [0004] The paper "Parallel and adaptive clustering method. .."in "1991. INT. SYMP. ON CIRCUITS AND SYSTEMS", 
Vol. 1/5, June 11, 1991 , pages 356 and following, by Y. Miyanaga et al., discloses a two-functional network in which 
adaptive methodes are implemented for sophisticated recognition and clustering. In a first subnet, clustering is based 
on Mahalanobis distance in order to determine a similarity vector, based on which an optimum label is determined in 
.a second subnetwork consisting of notes associated to specific labels. 

20 [0005] The paper "Ahyperellipsoid neural network ..." in "1991 IEEE INT. SYMP. ON CIRCUITS AND SYSTEMS", 
Vol. 2/5, June 11, 1991 , pages 1176 and following, by Jou Wu, Tsay and others, discloses a distance-based neural 
network for pattern classification in which for training purposes ahyperellipsoid is constructed for each training pattern. 
Here it is tried to merge the hyperellipsoids of the same classes without interfering the hyperellipsoids of other classes. 
[0006] It is the object of the invention to provide a classification method and apparatus capable of reliably classifying 

25 an input into one of a plurality of possible outputs. 

[0007] This object is accomplished in accordance with the features of the independent claims. Dependent claims 
are directed on preferred embodiments of the invention. 

Figs. 1 (a), 1(b), 1 (c), and 1 (d) are bitmap representations of a nominal letter "O", a degraded letter "O", a nominal 
30 number "7", and a degraded letter "7", respectively; 

Fig. 2 is a graphical depiction of a 2-dimensional feature space populated with 8 elliptical neurons thatmay be 

employed by the classification system of the present invention to classify images of the letters A, B, and C; 

Fig. 3 is a process flow diagram for classifying inputs according to a preferred embodiment of the present invention; 

Fig. 4 is a schematic diagram of part of the classification system of Fig. 3; 
35 Fig. 5 is a process flow diagram for generating neurons used by the classification system of Fig. 3; and 

. Fig. 6 is a schematic diagram of a classification system that uses cluster classifiers for classifying inputs according 

to a preferred embodiment of the present invention. 

[0008] The present invention includes a system for optical character recognition, but, more generally, the invention 
40 covers a classification system for classifying an input as one of a defined set of possible outputs. For example, where 
the input is an optically acquired image representing one of the 26 capital letters of the English alphabet, the classifi- 
cation system of the present invention may be used to select as an output that capital letter that is associated with the 
input image. The classification system of the present invention is discussed below in connection with Figs. 1(a), 2, 3, 
and 4. 

45 [0009] As background art, this specification also includes a system for "training" the classification system. This train- 
ing system is preferably operated off line prior to deployment of the classif ication system. In the character recognition 
example, the training system accepts input images representative of known characters to "learn" about the set of 
possible outputs into which unknown images will eventually be classified. The training system is discussed below in 
connection with Fig. 5. 

so [0010] As background art, this specification also includes a system for training the classification system based on 
ordering the training inputs according to the relative quality of the training inputs. This system for training is discussed 
below in connection with Figs. 1(a), 1(b), 1(c), and 1(d). 

[001 1 ] As background art, this specification also includes a system for adjusting the locations and shapes of neurons 
generated during the training systems. 
55 [0012] As background art, this specification also includes a classification system employing a hierarchical network 
of top-level and lower level cluster classifiers. The top-level classifier classifies inputs into one of a plurality of output 
clusters, where each output cluster is associated with a subset of the .set of possible outputs. A cluster classifier, 
associated with the output cluster identified by the top-level classifier, then classifies the input as corresponding to one 
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of the possible outputs. This classification system is discussed below in connection with Fig. 1(a), 1(b), 1(c), and 1(d). 
[0013] As background art, this specification also includes a neural system of classifying inputs that combines two 
subsystems. One subsystem counts the number of neurons that encompass a feature vector representing a particular 
input for each of the possible outputs. If one of the possible outputs has more neurons encompassing the feature vector 

'5 than any other possible output, then the system selects that possible output as corresponding to that input. Otherwise, 
the second subsystem.finds the neuron that has the smallest value for a particular distance measure for that feature 
vector. If that value is less than a specified threshold then the system selects the output associated with that neuron 
as corresponding to the input. This neural system is discussed below in connection with Figs. 1 (a), 2, 3, and 4. 
[0014] Referring now to Fig. 1 (a), there is shown a bitmap representation of a nominal letter "O". When the classifi- 

10 cation system of the present invention classifies optically acquired character images, each character image to be 
classified may be represented by an input bitmap, an (m x n) image array of binary values as shown in Fig. 1 (a). In a 
preferred embodiment, the classification system of the present invention generates a vector in a k-dimensional feature 
space from information contained in each input bitmap. Each feature vector F has feature elements /j, where 0 <j< k- 
1. The dimension of the feature space, k, may be any integer greater than one. Each feature element ^ is a real value 

15 corresponding to one of ATeatures derived from the input bitmap. 

[0015] The k features may be derived from the input bitmap using conventional feature extraction functions, such 
as, for example, the Grid or Hadamard feature extraction function. The feature vector F represents a point in the k- 
dimensional feature space. The feature elements ^ are the components of feature vector F along the feature-space 
axes of the /r-dimensional feature space. For purposes of this specification, the term "feature vector" refers to a point 

20 in feature space. 

[0016] In a preferred embodiment, a discriminant analysis transform may be applied to Grid-based or Hadamard- 
based feature vectors to define the feature space. In this embodiment, the separation between possible outputs may 
be increased and the dimensionality of the feature vector may be reduced by performing this discriminant analysis in 
which only the most significant Eigenvectors from the discriminant transformation are retained. 

25 [0017] ' The classification system compares a feature vector F, representing a particular input image, to a set of 
neurons in feature space, where each neuron is a closed /(--dimensional region or "hyper-volume" in the fc-dimensional 
feature space. For example, when (k = 2), each neuron is ah area in a 2-dimensional feature space, and when (k = 3), 
each neuron Is a volume in a 3-dimensional feature space. Fig. 2 shows a graphical depiction of an exemplary 2-di- 
mensional feature space populated with eight 2-dimensional neurons. 

30 [0018] In a preferred classification system, the boundary of at least one of the neurons populating a /c-dimensional" 
feature space Is defined by at least two axes that have different lengths. Some of these neurons may be generally 
represented mathematically as: 



k-1 i n .m 

j-o (b.) m ~ 



where cj define the center point of the neuron, fcj are the lengths of the neuron axes, and m and A are positive real 
constants. In a preferred embodiment, at least two of the neuron axis are of different length. The values gj that satisfy 
Equation (1) define the points in feature space that lie within or on the boundary of the neuron. Those skilled in the art 
will understand that other neurons within the scope of this invention may be represented by other mathematical ex- 
45 . pressions. For example, a neuron may be defined by the expression: 

50 j-0 jb. | 

where the function "MAX" computes the maximum value of the ratio as / runs from 0 to fc-1 . Neurons defined by Equation 
(2) are hyper-rectangles. 

55 [0019] In a preferred embodiment, the neurons are hype-ellipses in the ^-dimensional feature space. A hyper-ellipse 
is any hyper-volume defined by Equation (1 ), where (m = 2) and (A=1 ). More particularly, a hyper-ellipse is defined by 
the function: 
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k-1, .2 
j=0 (b,) 2 ~ 



where q define the hyper-ellipse center point, bj are the hyperellipse axis lengths, and the values gj that satisfy Equation 
(3) define the points that lie within or on the hyper-ellipse boundary. When all of the axes are the same length, the 
hyper-elipse is a hyper-sphere. In a preferred embodiment of the present invention, in at least one of the neurons, at 
10 least two of the axes are of different length. By way of example, there is shown in Fig. 2 elliptical neuron 1 , having 
center point (Cq 1 .^ 1 ). and axis £> 0 1 , V of different length. In a preferred embodiment, the axes of the neurons are 
aligned with the coordinate axes of the feature space. Those skilled in the art will understand that other neurons having 
axes that do not all align with the feature-space axes are within the scope of the invention. 

[0020] Each neuron is associated with a particular possible output. For example, each neuron may correspond to 
is one of the 26 capital letters. Each neuron is associated with only one of the possible outputs (e.g., letters), but each 
possible output may have one or more associated neurons, Furthermore, neurons may overlap one another in feature 
space. For example, as shown in Fig. 2, neurons 0, 1 , and 7 correspond to the character "A", neurons 2, 3, 5, and 6 
correspond to the character "B", and neuron 4 corresponds to the character "C". Neurons 1 and 7 overlap, as do 
neurons 2, 3, and 6 and neurons 3, 5, and 6. In an alternative embodiment (not shown), neurons corresponding to 
20 different possible outputs may overlap. The classification system may employ the neurons of Fig. 2 to classify input 
images representative of the letters A, B, and C. 

[0021] Referring now to Fig. 3, there is shown a process flow diagram of classification system 300 for classifying an 
input (e.g., a bitmap of an optically acquired character image) as one of a set of possible outputs (e.g., characters) 
according to a preferred embodiment of the present invention. In the preferred embodimentshown in Fig. 3, the neurons 

25 in classification system 300 are processed in parallel. In an alternative embodiment (not shown), the neurons of clas- 
sification system 300 may be processed in series/Means 302 is provided for receiving an input image bitmap and 
generating a feature vector that represents information contained in that bitmap. Means 304 and 306 are provided for 
comparing the feature vector generated by means 302 to a set of neurons, at least one of which has two or more axes 
of different length. Classification system 300 selects one of the possible outputs based upon that comparison. 

30 [0022] In a preferred embodiment, classification system 300 classifies optically acquired character bitmaps using a 
network of hyper-elliptical neurons. Means 302 of classification system 300 receives as input the bitmap of an optically 
acquired character image to be classified and generates a corresponding feature vector F Means 304 then determines 
an "elliptical distance" r x as a function of the center and axes of each of the E num hyper-elliptical neurons x in the 
network and feature vector F, where: 



5-0 



In Equation (4), cf and b* define the center point and axis lengths, respectively, of neuron x, where x runs from 0 to 
E„ um -1 , and are the elements of feature vector F. Those skilled in the art would recognize that distance measures 
different from that of Equation (4) may also be used. 

45 [0023] Means 306 determines which, if any, of the £ num neurons encompass feature vector FA neuron encompasses 
a feature vector - and may be referred to as an "encompassing neuron" — if the feature vector lies inside the boundary 
that defines the neuron in feature space. For hyper-ellipses, neuron x encompasses feature vector F, if (r x < 1 ). If (r x 
= 1), feature vector Flies on the boundary of neuron x, and if (r x > 1), feature vector Flies outside neuron x. Since 
neurons may overlap in feature space, a particular feature vector may be encompassed by more than^one neuron. In 

so Fig. 2, feature vector F g , corresponding to a particular input image, is encompassed by neurons 2 and 6. Alternatively, 
a feature vector may lie inside no neurons, as in the case of feature vector F h of Fig. 2, which corresponds to a different 
input image. 

[0024] Means 308 finds the "closest" neuron for each possible output. As described earlier, each neuron is associated 
with one and only one possible output, but each possible output may have one or more neurons associated with it. 
55 Means 308 analyzes all of the neurons associated with each possible output and determines the neuron "closest" to 
feature vector Fforthat output. The "closest" neuron will be the one having the smallest "distance" measure value r x 
In the example of feature vector F g of Fig. 2, means 308 will select neuron 1 as being the "closest" neuron to feature 
vector F g for the character "A". It will also select neuron 2 as the "closest" neuron for character "B" and neuron 4 for 
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character "C". 

[0025] Means 310 in Fig. 3 counts votes for each possible output. In a first preferred embodiment, each neuron that 
encompasses feature vector F is treated by means 310 as a single "vote" for the output associated with that neuron. 
In an alternative preferred embodiment discussed in greater detail with respect to Equation (7) below, each neuron 

s that encompasses feature vector Fis treated by means 310 as representing a "weighted vote" forthe output associated 
with that neuron, where the weight associated with any particular neuron is a function of the number of training input 
feature vectors encompassed by that neuron. In a preferred embodiment, means 310 implements proportional voting, 
where the weighted vote for a particular neuron is equal the number of feature vectors encompassed by that neuron. 
For each possible output, means 310 tallies all the votes for all the neurons that encompass feature vector F. There 

10 are three potential types of voting outcomes: either (1 ) one output character receives more votes than any other output 
character, (2) two or more output characters tie forthe mostvotes, or (3) all output characters receive no votes, indicating 
the situation where no neurons encompass feature vector F In Fig. 2, feature vector F g may result in the first type of 
voting outcome: character "B" may receive 2 votes corresponding to encompassing neurons 2 and 6, while characters 
"A" and "C" receive no votes. Feature vector F h of Fig. 2 results in the third type of voting outcome with each character 

is receiving no votes. 

[0026] Means 31 2 determines if the first type of voting outcome resulted from the application of means 310 to feature 
vector F If only one of the possible output characters received the most votes, then means 312 directs the processing 
of classification system 300 to means 314, which selects that output character as corresponding to the input character 
bitmap. Otherwise, processing continues to means 316. For feature vector F g in Fig. 2, means 312 determines that 
20 character "B" has more votes than any other character and directs means 314 to select "B" as the character corre- 
sponding to feature vector F g . For feature vector F h in Fig. 2, means 31 2 determines that no single character received 
the most votes and directs processing to means 316. 

[0027] Means 316 acts as a tie-breaker for the second and third potential voting outcome in which no outright vote- 
leader exists, either because of a tie or because the feature vector lies inside no neurons. To break the tie, means 316 

25 selects that neuron x which is "closest" in elliptical distance to feature vector Fand compares r x to a specified threshold 
value 6 m . If (r x < 6 m ), then means 318 selects the output character associated with neuron x as corresponding to the 
input character bitmap. Otherwise, the tie is not broken and classification system 300 selects no character for the input 
image. A "no-character-selected" result Is one of the possible outputs from classification system 300. For example, if 
classification system 300 is designed to recognize capital letters and the input image corresponds to the number "7" 

30 a no-character-selected result is an appropriate output. 

[0028] Threshold value e m may be any number greater than 1 and is preferably about 1 .25. As described earlier, 
when feature vector Fis inside neuron x, then (r x < 1), and when feature vector Fis outside neuron x, then (r x > 1). If 
the voting result from means 310 is a tie for the most non-zero votes, then means 316 will select the output character 
associated with the encompassing neuron having a center which is "closest" in elliptical "distance" feature vector F 

35 Alternatively, if there are no encompassing neurons, means 316 may still classify the input bitmap as corresponding 
to the output character associated with the "closest" neuron X, if (r x < e m ). Using a threshold value e m of about 1 .25 
establishes a region surrounding each neuron used by means 316 for tie-breaking. In Fig. 2, feature vector F h will be 
classified as character "C" if the "distance" measure r 4 is less than the threshold value e m ; otherwise, no character is 
selected. 

40 [0029] Referring now to Fig. 4, there is shown a schematic diagram of classification system 400 for classifying inputs 
as corresponding to a set of s possible outputs. Classification system 400 may perform part of the processing performed 
by classification system 300 of Fig, 3. Classification system 400 accepts feature vector F, represented by feature 

elements (f 0 ,^ f^), and generates values cf and q™ that act as pointers and/or flags to indicate the possible output 

to be selected. Classification system 400 includes four subsystem levels: input level 402, processing level 404, output 

45 level 406, and postprocessing level 408. 

[0030] Input level 402 includes the set /of frinput processing units /], where /'runs from 0 to /c-1 . Each input processing 
unit /j receives as input one and only one element /j of the feature vector Fand broadcasts this value to processing 
level 404. Input level 402 functions as a set of pass-through, broadcasting elements. 

[0031] Processing level 404 includes the set E of E num elliptical processing units e x , where x runs from 0 to £ num -1 . 
50 Each elliptical processing unit e x is connected to and receives input from the output of every input processing unit ;j of 
input level 402. Elliptical processing unit e x implements Equation (4) for neuron xof classification system 300 of Fig. 
3. Like neuron x of classification system 300, each elliptical processing unit e x is defined by two vectors of internal 
parameters: B* and O. The elements of vector B* are the lengths of the axes of neuron x, where: 

B x = (b 0 x ,b 1 x ,...,b£. 1 ) T , (5) 



EP 0 574 936 B1 



and the elements of vector & are the coordinates of the center point of neuron x, where: 

C x = (c 0 x , Cl x ■ (6) 

5 

[0032] Each elliptical processing unit e x of processing level 404 computes the distance measure r x from feature 
vector Fto the center of neuron x. Processing level 404 is associated with means 304 of classification system 300. If 
(r x < 1), then elliptical processing unit e x is said to be activated: otherwise, elliptical processing unit e x is not activated. 
In other words, elliptical processing unit e x is activated when neuron x encompasses feature vector F Each elliptical 

10 processing unit e x broadcasts the computed distance measure r x to only two output processing units of output level 406. 
[0033] Output level 406 includes two parts: output-total part 410 and output-minimize part 412. Output-total part 410 
contains the set O of s output processing units o n \ and output-minimize part 412 contains the set O" of s output 
processing units o n m , where n runs from 0 to s-1 , where s is also the number of possible outputs for which classification 
system 400 has been trained. For example, when classifying capital letters, s=26. Each processing unit pair (o n ',o n m ) 

15 is associated with only one possible output and vice versa. 

[0034] Each elliptical processing unit e x of processing level 404 is connected to and provides output to only one 
output processing unit o n ' of output-total part 410 and to only one output processing unit o n m of output-minimize part 
412. However, each output processing unit o n ' and each output processing unit o n m may be connected to and receive 
input from one or more elliptical processing units e x of processing level 404. These relationships are represented by 

20 connection matrices H* and W™, both of which are of dimension (s x £ num ). In a preferred embodiment, if there is a 
connection between elliptical processing unit e x of processing level 400 and output processing unit o n 4 of output-total 
part 41 0 of output level 406, an entry iv nx 'in connection matrix W* will have a value that is equal to the number of training 
input feature vectors encompassed by neuron x; otherwise, it has value 0. In a further preferred embodiment, entry 
w nx ' has a value 1 if there is a connection between elliptical processing unit e x and output processing unit o n l . 

2s [0035] Connection matrix W represents the connections between processing level 404 and output-minimize part 
412 of output level 406 and is related to connection matrix l/V. An entry w nx m in connection matrix W will have a value 
of 1 for every entry w nx ' in connection matrix H* that is not zero. Otherwise, entry W nx m will have a value of 0. 
[0036] Each output processing unit o n ' in output-total part 410 computes an output value o n l , where: 

30 
35 



where the function T(r x ) returns the value 0 if (r x > 1); otherwise, it returns the value 1 . In other words, the function T 
(r x ) returns the value 1 if elliptical processing unit e x of processing level 404 is activated. Output processing unit o n ' 
counts the votes for the possible output with which it is associated and outputs the total.' Output-total part 410 of output 
40 level 406 is associated with means 306 and means 310 of classification system 300. 

[0037] Similarly, each output processing unit o n m in output-minimize part 412 computes an output value o n m , where: 



W 1 , . , 

° n « x=s ° m . < w nx*x>' 
n forallwjj x *0 



so where the function "MIN" returns the minimum value of (w nx m r x ) over all the elliptical processing units e x . Therefore, 
each output processing unit o n m examines each of the elliptical processing units e x to which it is connected and outputs 
a real value equal to the. minimum output value from these elliptical processing units. Output-minimize part 412 of 
output level 406 is associated with means 308 of classification system 300. 

[0038] Postprocessing level 408 includes two postprocessing units p 1 and p™. Postprocessing unit p 1 is connected 
55 to and receives input from every output processing unit o n ' of output-total part 410 of output level 406. Postprocessing 
unit finds the output processing unit o n * that has the maximum output value and generates the value cf. If output 
processing unit o n ' of output-total part 410 has an output value greater than those of all the other output processing 
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units of output-total part 410, then the value cf is set to n - the index for that output processing unit. For example, 
when classifying capital letters n may be O for "A" and 1 for"B", etc. Otherwise, the value cf is set to -1 to indicate that 
output-total part 410 of output level 406 did not classify the input. Postprocessing unit p 1 of postprocessing level 408 
is associated with means 31 2 of classification system 300. 

5 [Q039] Similarly, postprocessing unit p™ ~ the other postprocessing unit in postprocessing level 408 - is connected 
to and receives input from every output processing unit o n m of output-minimize part 412 of output level 406. Post- 
processing unit fP 1 finds the output processing unit o n m that has the minimum output value and generates the value 
cf. If output processing unit o n m of output-minimize part 412 has an output value less than a specified threshold G m , 
then the value cf is set to the corresponding index n. Otherwise, the value cf is set to -1 to indicate that output- 

10 minimize part 412 of output level 406 did not classify the input, because the feature vector F is outside the threshold 
region surrounding neuron x for all neurons x. The threshold e m may be the same threshold G m used in classification 
system 300 of Fig. 3. Postprocessing unit p" 1 of postprocessing level 408 is associated with means 316 of classification 
system 300. 

[0040] Classification of the input is completed by analyzing the values cf and cf. If then the input is classified 

15 as possible output cf of the set of s possible outputs. If (q*=-1) and (qf^-l), then the input is classified as possible 
output cf of the set of s possible outputs. Otherwise, if both values are -1 , then the input is not classified as any of the 
s possible outputs. 

[0041] A neural network must be trained before it may be used to classify inputs. The training system performs this 
required training by generating at least one non-spherical neuron in the fc-dimensional feature space. The training 

20 system is preferably implemented off line prior to the deployment of a classification system. 

[0042] The training system of the present invention generates neurons based upon a set of training inputs, where 
each training input is known to correspond to one of the possible outputs in the classification set. Continuing with the 
example of capital letters used to describe classification system 300, each training input may be a bitmap corresponding 
to one of the characters from "A" to "Z". Each character must be represented by at least one training input, although 

25 typically 250 to 750 training inputs are used for each character. 

[0043] Referring now to Fig. 5, there is shown a process flow diagram of training system 500 for generating neurons 
in /r-dimensional feature space that may be used in classification system 300 of Fig. 3 or in classification system 400 
of Fig. 4. For example, when training for output classification, training system 500 sequentially processes a set of 
training bitmap Inputs corresponding to known outputs. At a particular point in the training, there will be a set of existing 

30 feature vectors that correspond to the training inputs previously processed and a set of existing neurons that have 
been generated from those existing feature vectors. For each training input, training system 500 generates a feature 
vector in a feature space that represents information contained in that training input. 

[0044] Training system 500 applies two rules in processing each training input. The first training rule is that if the 
feature vector, corresponding to the training input currently being processed, is encompassed by any existing neurons 

35 that are associated with a different known output, then the boundaries of those existing neurons are spatially adjusted 
to exclude that feature vector -- that is, to ensure that that feature vector is not inside the boundary of those existing 
neurons. Otherwise, neurons are not spatially adjusted. For example, if the current training input corresponds to the 
character "R" and the feature vector corresponding to that training input is encompassed by two existing "P" neurons 
and one existing "B" neuron, then the boundaries of these three existing neurons are spatially adjusted to ensure they 

40 do not encompass the current feature vector. 

[0045] The second training rule is that if the current feature vector is not encompassed by at least one existing neuron 
that is associated with the same known output, then a new neuron is created. Otherwise, no new neuron is created 
forthe current feature vector. For example, if the current training input corresponds to the character "Wand the feature 
vector corresponding to that training input is not encompassed by any existing neuron that is associated with the 

45 character "W", then a new "W" neuron is created to encompass that current feature vector. In a preferred embodiment, 
a new neuron is created by generating a temporary hyper-spherical neuron and then spatially adjusting that temporary 
neuron to create the new neuron. In an alternative preferred embodiment, the temporary neuron may be a non-spherical 
hyper-ellipse. 

[0046] In a preferred embodiment of the present invention, training system 500 generates hyper-elliptical neurons 
so from a set of training bitmap inputs corresponding to known characters. Training system 500 starts with no existing 
feature vectors and no existing neurons. Processing of training system 500 begins with means 502 which selects as 
the current training input a first training input from a set of training inputs. Means 504 generates the feature vector F 
that corresponds to the current training input. 

[0047] When thefirsttraining input is the current training input, there are no existing neurons and therefore no existing 
55 neurons that encompass feature vector F In that case, processing of training system 500 flows to means 514 which 
creates a new neuron centered on feature vector F The new neuron is preferably defined by Equation (3), where all 
the new neuron axes are set to the same length, that is, {1> } =X) for all /. Since the new neuron axes are all the same 
length, the new neuron is a hyper-sphere in feature space of radius X. In a preferred embodiment, the value of constant 
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X may be twice as large as the largest feature element /■ of all the feature vectors Ffor the entire set of training inputs. 
Since there are no existing feature vectors when processing the first training input, training system 500 next flows to 
means 528 from which point the processing of training system 500 may be described more generally. 
[0048] Means 528 determines whether the current training input is the last training input in the set of training inputs. 

5 If not, then means 528 directs processing of training system 500 to means 530 which selects the next training input 
as the current training input. Means 504 then generates the feature vector Fcorresponding to the current training input. 
[0049] Means 506 and 508 determine which, if any, existing neurons are to be spatially adjusted to avoid encom- 
passing feature vector F In a preferred embodiment, means 510 adjusts an existing neuron if that neuron is not asso- 
ciated with the same known character as the current training input (as determined by means 506) and if it encompasses 

10 feature vector F (as determined by means 508). Means 508 determines if an existing neuron encompasses feature 
vector Fby calculating and testing the "distance" measure r x of Equation (4) and testing whether (r x < 1) as described 
earlier. 

[0050] In a preferred embodiment, means 510 spatially adjusts an existing neuron by optimally shrinking it along 
only one axis. In another preferred embodiment, means 510 shrinks an existing neuron proportionally along one or 
15 more axes. These shrinking methods are explained in greater detail later in this specification. After processing by 
means 51 0, the current feature vector is not encompassed by any existing neurons that are associated with a character 
which is different from the character associated with the training input. Hence, the current feature vector lies either 
outside or on the boundaries of such existing neurons. 

[0051] Training system 500 also determines if a new neuron is to be created and, if so, creates that new neuron. A 
20 new neuron is created (by means 514) if the feature vector Fis not encompassed by any existing neuron associated 
with the same character as the training input (as determined by means 512). As described above, means 514 creates 
a new neuron that is, preferably, a hyper-sphere of radius X. 

[0052] Training system 500 then tests and, if necessary, spatially adjusts each new neuron created by means 514 
to ensure that it does not encompass any existing feature vectors that are associated with a character which is different 

25 from the character associated with the training input. Means 516, 524, and 526 control the sequence of testing a new 
neuron against each of the existing feature vectors by selecting one of the existing feature vectors at a time. If a new 
neuron is associated with a character different from that of the currently selected existing feature vector (as determined 
by means 518) and if the new neuron encompasses that selected existing feature vector (as determined by means 
520 using Equation (4)), then means 524 spatially adjusts the new neuron by one of the same shrinking algorithms 

30 employed by means 510. Training system 500 continues to test and adjust a new neuron until all existing feature 
vectors have been processed. Since the hyper-spherical neuron created by means 514 is adjusted by means 522, that 
hyper-spherical neuron is a temporary neuron with temporary neuron axes of equal length. Processing of training 
system 500 then continues to means 528 to control the selection of the next training input. 

[0053] In a preferred embodiment, the steps of (1 ) shrinking existing neurons for a given input, and (2) creating and 
35 shrinking a new neuron created for that same input may be performed in parallel. Those skilled in the art will understand 
that these two steps may also be performed sequentially in either order. 

[0054] In a preferred embodiment, after all of the training inputs in the set of training inputs have been processed 
sequentially, means 528 directs processing of training system 500 to means 532. After processing a set of training 
inputs with their corresponding feature vectors, feature space is populated with both feature vectors and neurons. After 

40 processing the set of training inputs one time, some feature vectors may not be encompassed by any neurons. This 
occurs when feature vectors, that were, at some point in the training process, encompassed by neuron(s) of the same 
character, become excluded from those neurons when those neurons were shrunk to avoid subsequent feature vectors 
associated with a different character. In such a situation, means 532' directs processing to return to means 502 to 
repeat processing of the entire set of training inputs. When repeating this processing, the previously created neurons 

45 are retained. By iteratively repeating this training process, new neurons are created with each iteration until eventually 
each and every feature vector is encompassed by one or more neurons that are associated with the proper output and 
no feature vectors are encompassed by neurons associated with different possible outputs. Moreover, this iterative 
training is guaranteed to converge in a finite period of time with the maximum number of iterations being equal to the 
total number of training inputs. 

50 [0055] After training system 500 completes its processing, the feature space is populated with neurons that may 
then be used by characterization system 300 or characterization system 400 to classify an unknown input into one of 
a plurality of possible outputs. 

[0056] As mentioned earlier, in a preferred embodiment, training system 500 spatially adjusts the boundary of a 
hyper-elliptical neuron to exclude a particular feature vector by optimally shrinking along one axis. Means 510 and 522 
55 of training system 500 may perform this one-axis shrinking by (1 ) identifying the axis to shrink, and (2) calculating the 
new length for that axis. 

[0057] Training system 500 identifies the axis n to shrink by the formula: 
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y EPF 



where the function "argmax" returns the value of /'that maximizes the expression in the square brackets for any /from 
0 to ; cj and fi) define the center point and axis lengths, respectively, of the neuron to be adjusted; and /j define the 
feature vector to be excluded by that neuron. 

[0058] Training system 500 then calculates the new length b n ' for axis n by the equation: 



In one-axis shrinking, all other axes retain their original lengths fcj. 

[0059] One-axis shrinking of an original hyper-elliptical neuron according to Equations (9) and (10) results in : an 
adjusted neuron with the greatest hyper-volume Vthat satisfies the following four criteria: 

3D (1) The adjusted neuron is a hyper-ellipse; 

(2) The center point of the original neuron- is the sane as the center point of the adjusted neuron; 
■ (3) The feature vector to be excluded lies on the boundary of the.adjusted neuron; and 

(4) All points within or on the boundary of the adjusted neuron lie within or on the boundary of the original neuron. 

35 The hyper-volume V is defined by: 



Jc-l 

1 qjl*j 



where Q is a constant that depends on the value of k, where k is the dimension of the feature space, and 6j are the 
lengths of the axes defining the adjusted neuron. One-axis shrinking, therefore, provides a first method for optimally 
45 adjusting neurons. 

[0060] In alternative preferred embodiment, training system 500 spatially adjusts the boundary of a hyper-elliptical 
neuron to exclude a particular feature vector by shrinking proportionally along one or more axes. Means 510 and 522 
of training system 500 may perform proportional shrinking by calculating the vector AS of axis length changes Atop 
where: 
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where: 
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AS = (Ab 0 , Ab^ A^), 
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(21) 



where I/q-CqI is the absolute value of (f 0 -Cb); HF-QI is the magnitude of the vector difference between Fand C, c, and 
/?! define the center point and axis lengths, respectively, of the neuron to be adjusted; /j are the elements of the feature 
vectorto be excluded from that neuron; and a and y i may be constants. The new axis lengths t> ( forthe adjusted neuron 
are calculated by: 
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bj = b j + Abj (22) 

for /from Oto fr-1. 

s [0061] In proportional shrinking, training system 500 determines the projections of a vector onto the axes of the 
neuron to be adjusted, where the vector points from the center of that neuron to the feature vector to be excluded. 
These projections are represented by the vector of cosines of Equation (14). Training system 500 then determines 
how much to shrink each neuron axis based on the relationship between the length of the axis and the length of the 
projection onto that axis. 

10 [0062] In a preferred embodiment, the constant a in Equation (1 2) is selected to be less than 1 . In this case, training 
system 500 may perform iterative shrinking, where the neuron is slowly adjusted over multiple axis-shrinking steps 
until it is determined that the feature vector to be excluded is outside the adjusted neuron. In a preferred embodiment, 
parameter Yj may be set to a positive value that is roughly 0.001 times the size of axis / to ensure that proportional 
shrinking eventually places the feature vector outside the neuron. In an alternative preferred embodiment, the param- 

15 eters Yj may be error functions based on the distance from the feature vector to the boundary of the slowly adjusted 
neuron. In such case, training system 500 may operate as a proportional integral controller for adjusting neurons. 
[0063] In a preferred embodiment, the set of training inputs, used sequentially by the training system to generate 
neurons, may be organized according to input quality. The training inputs may be ordered to train with higher quality 
inputs before proceeding to those of lower quality. This quality ordering of training inputs ensures that neurons are 

20 centered about feature vectors that correspond to inputs of higher quality. Such ordered training may improve the 
performance efficiency of a classification system by reduci ng the numbers of neurons needed to define the classification 
system. Such ordering may also reduce the numbers of misclassifications and non-classifications made by the clas- 
sification system. A misclassification is when a classification system selects one possible output when, in truth, the 
input corresponds to a different possible output A non-classification is when a classification system fails to select one 

25 of the known outputs and instead outputs a no-output-selected result. 

[0064] Referring now to Figs. 1 (a), 1 (b), 1 (c), and 1 (d), there are shown bitmap representations of a nominal letter 
"O", a degraded letter "O", a nominal number "7", and a degraded letter "7", respectively. A nominal input is an ideal 
input with no noise associated with it. A degraded input is one in which noise has created deviations from the nominal 
input. Degraded inputs may result from either controlled noise or real unpredictable noise. 

30 [0065] In a preferred embodiment, the training system of the present invention may train with training inputs of three 
different quality levels. The first level of training inputs are nominal inputs like those presented in Figs. 1 (a) and 1 (c). 
The second level of training inputs are controlled noise inputs, a type of degraded input created by applying defined 
noise functions or signals with different characteristics, either independently or in combination, to nominal inputs. The 
third level of training inputs are real noise inputs, a second type of degraded inputs which, in the case of characters, 

35 may be optically acquired images of known characters. Such degraded inputs have real unpredictable noise. Figs. 1 
(b) and 1 (d) present representations of possible controlled noise inputs and real noise inputs. In a preferred embodi- 
ment, the nominal inputs have the highest quality, with the controlled noise inputs and real noise inputs of decreasing 
lesser quality. Depending upon the controlled noise functions and signals applied, a particular controlled-noise input 
may be of greater or lessor quality than a particular real-noise input. 

40 [0066] The quality of a particular degraded input - of either controlled-noise or real-noise variety - may be determined 
by comparing the degraded input to a nominal input corresponding to the same known character. In a preferred em- 
bodiment, a quality measure may be based on the number of pixels that differ between the two inputs. In another 
preferred embodiment, the quality measure may be based on conventional feature measures such as Grid or Hadamard 
features. 

45 [0067] In a preferred embodiment, training systems train first with the nominal inputs and then later with degraded 
controlled-noise and real-noise inputs. In this preferred embodiment, training with inputs corresponding to Figs. 1 (a) 
and 1 (c) would proceed training with those of Figs. 1 (b) and 1 (d). In another preferred embodiment, thetraining system 
trains with all inputs of the same known character prior to proceeding to the next known character, and the training 
inputs of each known character are internally organized by quality. In this preferred embodiment, training with Fig. 1 

so (a) proceeds that with Fig. 1(b), and training with Fig. 1(c) proceeds that with Fig. 1(d). Those skilled in the art will 
understand that the exact overall sequence of training with all of the inputs is of lessor importance than ordering of 
inputs by quality for each different known character. 

[0068] After the training system has completed training, the feature space is populated with neurons that encompass 
feature vectors, with one feature vector corresponding to each distinct training input. Each neuron may encompass 
55 one or more feature vectors - the one at the center of the neuron that was used to create the neuron and possibly 
other feature vectors corresponding to inputs associated with the same known character. 

[0069] Depending upon the quality ordering of the training inputs used in the sequential training, a particular neuron 



11 



EP 0 574 936 B1 



may encompass those feature vectors in a more or less efficient manner. For example, if the feature vector used to 
create a particular.neuron corresponds to a highly degraded input, then that feature vector will lie at the center of that 
neuron. That same neuron may also encompass other feature vectors corresponding to nominal inputs and inputs of 
lessor degradation. Such a neuron may not be the most efficient neuron for encompassing that set of feature vectors. 
5 A classification system using such a neuron may make more misclassifications and non-classifications than one using 
a more efficient neuron. 

[0070] A refinement system spatially adjusts neurons, created during training, to create more efficient neurons. This 
' refinement system may characterize the spatial distribution of feature vectors encompassed by a particular neuron 
and then spatially adjust that neuron. Such spatial adjustment may involve translating the neuron from its current center 
10 point toward the mean of the spatial distribution of those feature vectors. After translating the neuron, the axis lengths 
may be adjusted to ensure that feature vectors of the same output character are encompassed by the neuron and to 
ensure that feature vectors of different output character are excluded. 

[0071] In an alternative embodiment, the refinement system may spatially adjust two or more neurons of the same 
character to create one or more neurons that more efficiently encompass the same feature vectors, where a feature 

15 vector from one original neuron may be encompassed by a different more efficient neuron. For example, before re- 
finement, a first neuron may encompass feature vectors F,, F 2 , and F 3 , and a second neuron may encompass feature 
vectors F 4 , F 5 , F 6 , and F 7 . After refinement, feature vectors F v F 2 , F 3 , and F 4 may be encompassed by a third neuron, 
and feature vectors F 5 , F 6 , and F 7 may be encompassed by a fourth neuron, where the centers and axis lengths of 
the third and fourth neurons are all different from those of the first and second neurons. 

20 [0072] In a preferred embodiment, a classification system classifies inputs into one of a set of possible outputs by 
comparing the feature vector, for each input to be classified, with every neuron in the feature space. Such classification 
systems are presented in Figs. 3 and 4. 

[0073] Referring now to Fig. 6, there is shown classification system 600 - a second preferred embodiment - in which 
inputs are classified into one of a set of possible outputs using neurons and cluster classifiers. Classification system 

25 600 includes top-level classifier 602 and two or more cluster classifiers 604, 606 608. Top-level classifier 602 

classifies inputs into appropriate clusters of inputs. For example, where classification system 600 classifies characters, 
top-level classifiers 602 may classify input bitmaps corresponding to optically acquired characters into clusters of char- 
acters. 

[0074] The characters clustered together may be those represented by similar bitmaps, or, in other words, those 

30 characters associated with feature vectors close to one another in feature space. For example, a first character cluster 
may correspond to the characters "D", "P", "Ft" and "B". A second character cluster may correspond to the characters 
"O", "C", "D", "U", and "Q". A third cluster may correspond to only one character such as the character "Z". A particular 
character may be in more than one character cluster. In this example, the character "D" is in both the first and the 
second character clusters, because its bitmaps are similar to those of both clusters. 

35 [0075] In a preferred embodiment, before training, characters are clustered based oh a confusion matrix. The con- 
fusion matrix represents the likelihood that one character will be confused with another character for every possible 
pair of characters. In general, the closer the feature vectors of one character are to those of another character, the 
higher the likelihood that those two characters may be confused. For example, the character "D" may have a higher 
confusion likelihood with respect to the "O" than to the "M", if the feature vectors for "D" are closerto the feature vectors 

40 for "O" than to those for "M". 

[0076] In a preferred embodiment, the clustering of characters is based upon a conventional K-Means Clustering 
Algorithm, in which a set of templates is specified for each character, where each template is a point in feature space. 
The K-Means Clustering Algorithm determines where in feature space to locatethe templates for a particular character 
by analyzing the locations of the feature vectors for all of the training inputs corresponding to that character. Templates 

45 are preferably positioned near the arithmetic means of clusters of associated feature vectors. 

[0077] In a preferred embodiment, four templates may be used for each character and the number of characters per 
cluster may be roughly even. For example, when classifying the 64 characters corresponding to the 26 capital'and 26 
lower-case letters, the 10 digits, and the symbols "&" and "#", 4x64 or 256 templates may be used to define 7 different 
clusters of roughly equivalent numbers of characters. 

so [0078] By clustering characters, top-level classifier 602 may implement a classification algorithm that quickly and 
accurately determines the appropriate cluster for each input. In a preferred embodiment, top-level classifier 602 im- 
plements a neuron-based classification algorithm. In another preferred embodiment, other conventional non-neural 
classification algorithms may be performed by top-level classifier 602. Top-level classifier 602 selects the appropriate 
cluster for a particular input and directs processing to continue to the appropriate cluster classifier 604, 606 608. 

55 Each cluster classifier is associated with one and only one character cluster, and vice versa. 

[0079] In one preferred embodiment, each cluster classifier may implement a classification algorithm unique to that 
character cluster, or shared by only a subset of the total number of character clusters. Each cluster classifier may 
therefore employ neurons that exist in a feature space unique to that character cluster. For example, training for the 
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"P", "R", "B" cluster may employ a particular set of Grid features, while training for the "O". "C", "D", "U", "Q" cluster 
may employ a different set of Hadamard features, In that case, different training procedures are performed for each 
different cluster classifier, where only inputs corresponding to those characters of the associated cluster are used for 
each different training procedure. 

5 [0080] In a third preferred embodiment, a classification system according to Fig. 6 may classify inputs into one of a 
set of possible outputs using neurons and cluster classifiers. In this third embodiment, top-level classifier 602 identifies 
the template in feature space closest to the feature vector for the current input to be classified. The identified template 
is associated with a particular character that belongs to one or more character clusters. The top-level classifier 602 
directs processing to only those cluster classifiers 604, 606, 608 associated with the character clusters of the closest 

10 template. Since a particular character may be in more than one character cluster, more than one cluster classifier may 
be selected by top-level classifier 602 for processing. 

[0081] In afourth preferred embodiment, each clusterclassifiermay have a decision tree that identifies those neurons 
that should be processed for a given input. Prior to classifying, feature vector space for a particular cluster classifier 
may be divided into regions according to the distribution of feature vectors and/or neurons in feature space. Each 
is region contains one or more neurons, each neuron may belong to more than one region, and two or more regions may 
overlap. Top-level classifier 602 may determine in which feature-space region (or regions) the feature vector for the 
current input lies and may direct the selected cluster classifiers to process only those neurons associated with the 
region (or those regions). • ■ 

[0082] Those skilled in the art will understand that some classification systems may use decision trees without cluster 
20 classifiers, some may use cluster classifiers without decision trees, some may use both, and others may use neither. 
Those skilled in the art will further understand that decision trees and cluster classifiers may increase the efficiency of 
classification systems by reducing processing time. 

[0083] Those skilled in the art will understand that classifying systems may be arranged in series or parallel. For 
example, in a preferred embodiment, a first character classifier based on Grid features may be arranged in series with 
25 a second character classifier based on Hadamard features. In such case, thefirst classifier classifies a particular bitmap 
input as one of the known characters or it fails to classify that input. If it fails to classify, then the second classifier 
attempts to classify that input. 

[0084] In an alternative embodiment, two or more different classifiers may be arranged in parallel. In such case, a 
voting scheme may be employed to select the appropriate output by comparing the outputs of each different classifier. 

30 [0085] In a preferred embodiment, classification systems and training systems perform parallel processing, where 
each elliptical processing unit may run on a separate computer processor during classification, although those skilled 
in the art will understand that these systems may also perform serial processing. In a preferred embodiment, the 
classification systems and training systems may reside in a reduced instruction set computer (RISC) processor such 
as a SPARC 2 processor running on a SPARC-station 2 marketed by Sun Microsystems. 

35 [0086] Those skilled in the art will understand that inputs other than character images may be classified with the 
classification systems. In general, any input may be classified as being one of a set of two or more possible outputs, 
where a no-selection result is one of the possible outputs. For example, the classification systems may be used, to 
identify persons based upon images of their faces, fingerprints, or even earlobes. Other classification systems of the 
present invention may be used to identify people from recordings of their voices. 



Claims 

1 . A classification method for classifying an input into one of a plurality of possible outputs, comprising the steps of: 

45 

(a) comparing information representative of said input to a neuron, wherein said neuron comprises a boundary 
defined by two or more neuron axes of different length; and 

(b) selecting one of said possible outputs as corresponding to said input in accordance with the comparison 
of step (a), wherein 

50 

step (a) comprises the step of comparing information representative of said input to a plurality of neurons, 
wherein each neuron of said plurality of neurons comprises a boundary defined by two or more neuron axes of 
different length, 
characterised in that 

55 said information representative of said input comprises a feature vector, wherein step (a) further comprises 

the step of selecting each neuron that encompasses said feature vector, 

by determining distance measures from said feature vectorto each of said neurons, and wherein step (b) comprises 
the steps of : 
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(i) selecting a neuron of said plurality of neurons having the smallest distance measure of said distance meas- 
ures; and 

(ii) selecting a possible output of said plurality of possible outputs that is associated with said selected neuron 
as corresponding to said input, 

and wherein step (b) further comprises the steps of: 

(i) determining a first number that is a function of the number of said selected neurons that are associated 
with a first possible output of said plurality of possible outputs; 

(ii) determining a second number that is a function of the number of said selected neurons that are associated 
with a second possible output of said plurality of possible outputs; and 

(iii) if saidfirstnumberisgreaterthan said second numberthen determining that said input does not correspond 
to said second possible output, else if said second number is greater than said first numberthen determining 
that said input does not correspond to said first possible output. 

The classification method of claim 1 , wherein said neuron is in a feature space comprising two or more feature- 
space axes, wherein at least one of said neuron axes is parallel to one of said feature-space axes. 

The classification method of claim 1 , wherein said neuron is in a feature space comprising two or more feature- 
space axes, wherein at least one of said neuron axes is parallel to none of said feature-space axes. 

The classification method of claim 1 , wherein said first number is equal to the number of said selected neurons 
that are associated with said first possible output, and said second number that is equal to the number of said 
selected neurons that are associated with said second possible output. 

The method of claim 1 , wherein said feature vector comprises a feature element, wherein said feature element is 
a Grid feature element or a Hadamard feature element. 

The method of claim 1 , wherein said neuron is a hyper-ellipse or a hyper-rectangle in k-dimensional feature space, 
where k is greater than or equal to two. 

A classification apparatus for classifying an input into one of a plurality of possible outputs, comprising: 

comparing means for comparing information representative of said input to a neuron, wherein said neuron 
comprises a boundary defined by two or more neuron axes of different length; and 

selecting means for selecting one of said possible outputs as corresponding to said input in accordance with 
the comparison by said comparing means, 

wherein said comparing means compares information representative of said input to a plurality of neurons, 
wherein each neuron of said plurality of neurons comprises a boundary defined by two or more neuron axes of 
different length, 

characterised in that said information representative of said input comprises a feature vector, said comparing 
means selects each neuron that encompasses said feature vector, said comparing means determines distance 
measures from said feature vector to each of said neurons, said selecting means selects a neuron of said plurality 
of neurons having the smallest distance measure of said distance measures, and said selecting means selects a 
possible output of said plurality of possible outputs that is associated with said selected neuron as corresponding 
to said input, 

said selecting means determines a first number that is a function of the number of said selected neurons that are 
associated with a first possible output of said plurality of possible outputs, said selecting means determines a 
second number that is a function of the number of said selected neurons that are associated with a second possible 
output of said plurality of possible outputs, and if said first number is greater than said second number, then said 
selecting means determines that said input does not correspond to said second possible output, else if said second 
number is greater than said first number, then said selecting means determines that said input does not correspond 
to said first possible output. 

The classification apparatus of claim 7, wherein said neuron is in a feature space comprising two or more feature- 
space axes, wherein at least one of said neuron axes is parallel to one of said feature-space axes. 
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9. The classification apparatus of claim 7, wherein said neuron is in a feature space comprising two or more feature- 
space axes, wherein at least one of said neuron axes is parallel to none of said feature-space axes. 

10. The classification apparatus of claim 7, wherein said first number is equal to the number of said selected neurons 
5 that are associated with said first possible output, and said second number that is equal to the number of said 

selected neurons that are associated with said second possible output. 

1 1 . The apparatus of claim 7, wherein said feature vector comprises a feature element, wherein said feature element 
is a Grid feature element or a Hadamard feature element. 

10 

1 2. The apparatus of one of the claims 7, wherein said neuron is a hyper-ellipse or a hyper-rectangle in k-dimensional 
feature space, where k is greater than or equal to two. 

is Patentanspriiche 

1. Klassifikationsverfahren zum Klassifizieren einer Eingabe hin zu einer von mehreren moglichen Ausgaben, mit 
den Schrltten: 

20 (a) Vergleichen von die Eingabe darstellender Information mit einem Neuron, das eine durch zwei Oder mehr 

unterschiedlich lange Neuronachsen definierte Grenze aufweist; und 

(b) Wahlen einer der moglichen Ausgaben als dem Eingang entsprechend nach MaBgabe des Vergleichs im 
Schritt (a), wobei 

25 Schritt (a) den Schritt des Vergleichens von die Eingabe darstellender Information mit mehreren Neuronen auf- 

weist, wobei jedes der Neuronen eine durch unterschiedlich lange Neuronachsen definierte Grenze aufweist, 
dadurch gekennzeichnet, daB 

die die Eingabe darstellende Information einen Merkmalsvektor aufweist und wobei Schritt (a) den Schritt des 
Auswahlens jedes Neurons aufweist, das den Merkmalsvektor einschlieBt, indem ein AbstandsmaB vom Merk- 
30 malsvektor zu jedem der Neuronen bestimmt wird, und wobei Schritt (b) folgende Schritte aufweist: 

(I) Auswahlen desjenigen der Neuronen, das unterden AbstandsmaBen das kleinste AbstandsmaB hat; und 
.„ (ii) Wahlen derjenigen der moglichen Ausgaben, die mit dem ausgewahlten Neuron assoziiert ist, als der 
Eingabe entsprechend, 

35 

und wobei Schritt (b) weiterhin folgende Schritte aufweist: 

(i) Bestimmen einer ersten Zahl, die eine Funktion der Anzahl der ausgewahlten Neuronen ist, die mit einer 
ersten moglichen Ausgabe der mehreren moglichen Ausgaben assoziiert sind; 
40 (ii) Bestimmen einer zweiten Zahl, die eine Funktion der Anzahl der ausgewahlten Neuronen ist, die mit einer 

zweiten moglichen Ausgabe der mehreren moglichen Ausgaben assoziiert sind; und 
(Hi) wenn die erste Zahl groBer ist als die zweite Zahl Bestimmen, daB die Eingabe nicht der zweiten moglichen 
Ausgabe entspricht, und wenn die zweite Zahl groBer ist als die erste Zahl Bestimmen, daB die Eingabe nicht 
der ersten moglichen Ausgabe entspricht. 

45 

2. Klassifikationsverfahren nach Anspruch 1, bei dem das Neuron in einem Merkmalsraum mit zwei oder mehr Merk- 
malsraumachsen liegt, wobei zumindest eine der Neuronachsen parallel zu einer der Merkmalsraumachsen ist. 

3. Klassifikationsverfahren nach Anspruch 1 , bei dem das Neuron in einem Merkmalsraum liegt mit zwei oder mehr 
so Merkmalsraumachsen, wobei zumindest eine der Neuronachsen parallel zu keiner der Merkmalsraumachsen ist. 

4. Klassifikationsverfahren nach Anspruch 1 , bei dem die erste Zahl gleich der Zahl der ausgewahlten Neuronen ist, 
die mit einer ersten mogiichen Ausgabe assoziiert sind, und die zweite Zahl gleich der Anzahl der ausgewahlten 
Neuronen ist, die der zweiten moglichen Ausgabe assoziiert sind. 



Verfahren nach Anspruch 1 , bei dem der Merkmalsvektor ein Merkmalselement aufweist, wobei das Merkmals- 
element ein Grid-Merkmalselement oder ein Hadamard-Merkmalselement ist. 
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6. Verfahren nach Anspruch 1 , bei dem das Neuron eine Hypereilipse Oder ein Hyperrechteck in einem k-dimensio- 
nalen Merkmalsraum ist, wobei k groBer gleich zwei ist. 

7. Klassifikationsvorrichtung zum Klassifizieren einer Eingabe hin zu einer von mehreren moglichen Ausgaben, mit . 
5 einer Vergleichseinrichtung zum Vergleichen von die Eingabe darstellender Information mit einem Neuron, wobei 

das Neuron eine durch zwei Oder mehr unterschiedlich lange Neuronachsen definierte Grenze aufweist; und 
einer Auswahleinrichtung zum Wahlen einer der moglichen Ausgaben als der Eingabe entsprechend nach 
MaBgabe des Vergleichs durch die Vergleichseinrichtung, 

wobei die Vergleichseinrichtung Information, die die Eingabe darstellt, mit mehreren Neuronen vergleicht, wobei 
10 jedes der Neuronen eine durch zwei Oder mehr unterschiedlich lange Neuronachsen definierte Grenze aufweist, 

dadurch gekennzeichnet, daB die die Eingabe darstellende Information einen Merkmalsvektor aufweist, die Ver- 
gleichseinrichtung jedes Neuron auswahlt, das den Merkmalsvektor einschlieBt, wobei die Vergleichseinrichtung 
ein AbstandsmaB vom Merkmalsvektor zu jedem der Neuronen bestimmt und die Auswahleinrichtung dasjenige 
der Neuronen wahlt, das das kleinste AbstandsmaB der AbstandsmaBe hat, und wobei die Auswahleinrichtung 
is diejenige der moglichen Ausgaben als der Eingabe entsprechend wahlt, die mit dem ausgewahlten Neuron asso- 

ziiert ist, wobei die Auswahleinrichtung eine erste Zahl bestimmt, die eine Funktion der Zahl der ausgewahlten 
Neuronen ist, die mit einer ersten moglichen Ausgabe der moglichen Ausgaben assoziiert sind, wobei die Aus- 
wahleinrichtung eine zweite Zahl bestimmt, die eine Funktion der Zahl der ausgewahlten Neuronen ist, die mit 
einerzweiten moglichen Ausgabe der moglichen Ausgaben assoziiert sind, wobei dann, wenn die erste Zahl groBer 
20 ist als die zweite Zahl, die Auswahleinrichtung bestimmt, daB die Eingabe nicht der zweiten moglichen Ausgabe 

entspricht, und dann, wenn die zweite Zahl groBer als die erste Zahl ist, die Auswahleinrichtung bestimmt, daB 
die Eingabe nicht der ersten moglichen Ausgabe entspricht. 

8. Klassifikationsvorrichtung nach Anspruch 7, bei der das Neuron in einem Merkmalsraum mit zwei Oder mehr Merk- 
25 malsraumachsen liegt, wobei zumindest eine der Neuronachsen parallel zu einer der Merkmalsraumachsen ist. 

9. Klassifikationsvorrichtung nach Anspruch 7, bei der das Neuron in einem Merkmalsraum mit zwei Oder mehr Merk- 
malsraumachsen liegt, wobei zumindest eine der Neuronachsen parallel zu keiner der Merkmalsraumachsen ist. 

30 10. Klassifikationsvorrichtung nach Anspruch 7, bei der die erste Anzahl gleich der Anzahl der ausgewahlten Neuronen 
ist, die mit der ersten moglichen Ausgabe assoziiert sind, und wobei die zweite Zahl gleich der Anzahl der ausge- 
wahlten Neuronen Ist, die der zweiten moglichen Ausgabe assoziiert sind. 

11. Vorrichtung nach Anspruch 7, wobei der Merkmalsvektor ein Merkmalselement aufweist, wobei das Merkmalsele- 
35 ment ein Grid-Merkmalselement oder ein Hadamard-Merkmalselement ist. 

12. Vorrichtung nach Anspruch 7, bei der das Neuron eine Hyperellipse Oder ein Hyperrechteck in einem k-dimensio- 
nalen Merkmalsraum ist, wobei k groBer gleich zwei ist. 



Revendications 

1 . Methode de classification pour classifier une entree dans une sortie parmi une pluralite de sorties possibles, com- 
prenantlesetapessuivant.es: 

45 

a) comparer reformation representative de ladite entree a un neurone, ou ledit neurone comprend une limite 
definie par deux axes de neurones ou plus de differentes longueurs ; 

et 

b) selectionner une desdites sorties possibles comme correspondant a ladite entree selon la comparaison de 
so I'etape (a), 

ou 

I'etape (a) comprend I'etape consistant a comparer I'information representative de ladite entree a une pluralite de 
neurones, ou chaque neurone de ladite pluralite de neurones comprend une limite definie par deux axes de neu- 
55 rones ou plus de differentes longueurs, 

caracterisee en ce que 

ladite information representative de ladite entree comprend un vecteur a fonctions, ou I'etape (a) comprend en 
outre I'etape de selection de chaque neurone quienglobe ledit vecteur a fonctions en determinant les mesures de 
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distance dudit vecteur a fonctions a chacun desdits neurones et oil I'etape (b) comprend les etapes suivantes : 

i. selectionner un neurone de ladite pluralite de neurones ayant la plus petite mesure de distance desdites 
mesures de distance ; et 

ii. selectionner une sortie possible de ladite pluralite de sorties possibles qui est associee avec ledit neurone 
selectionne comme correspondant a ladite entree, 

et ou I'etape (b) comprend en outre les etapes suivantes : 

i. determiner un premier nombre qui est une fonction du nombre desdits neurones selectionnes qui sont as- 
socies a une premiere sortie possible de ladite pluralite de sorties possibles ; 

ii. determiner un deuxieme nombre qui est une fonction du nombre desdits neurones selectionnes qui sont 
15 associes a une deuxieme sortie possible de ladite pluralite des sorties possibles ; et 

iii. si ledit premier nombre est plus grand que ledit deuxieme nombre, alors determiner que ladite entree ne 
correspond pas a ladite deuxieme sortie possible, ou bien si ledit deuxieme nombre est superieur au dit premier 
nombre, alors determiner que ladite entree ne correspond pas a ladite premiere sortie possible. 

20 

2. Methode de classification selon la revendication 1 , ou ledit neurone est dans un espace des fonctions comprenant 
deux axes de I'espace des fonctions ou plus ; ou au moins un desdits axes de neurone est parallele a I'un desdits 
axes de I'espace des fonctions. 

25 3. Methode de classification selon la revendication 1 , ou ledit neurone est dans un espace des fonctions comprenant 
deux axes de I'espace des fonctions ou plus, oil au moins un desdits axes de neurone est parallele a aucun desdits ■ 
axes de I'espace des fonctions. 

4. Methode de classification selon la revendication 1 , ou ledit premier nombre est egal au nombre desdits neurones 
30 selectionnes qui sont associes a ladite premiere sortie possible, et ledit deuxieme nombre est egal au nombre 

desdits neurones selectionnes qui sont associes a ladite deuxieme sortie possible. 

5. Methode selon la revendication 1, ou ledit vecteur a fonctions comprend un element, ou ledit element est un 
element de Grid ou un element de Hadamard! 

35 

6. Methode selon la revendication 1 , ou ledit neurone est une hyper-ellipse ou un hyper-rectangle dans I'espace des 
fonctions k- dimensionnels, ou k est superieur ou egal a deux. 

7. Appareil de classification pour classifier une entree dans une sortie parmi une pluralite de sorties possibles, 
40 comprenant : 

Un moyen de comparaison pour comparer I'information representative de ladite entree a un neurone, ou ledit 
neurone comprend une limite definie par deux axes de neurones ou plus de differentes longueurs, et 

45 un moyen de selection pour selectionner une desdites sorties possibles comme correspondant a ladite entree 

selon la comparaison faite avec ledit moyen de comparaison, 

ou ledit moyen de comparaison compare une information representative de ladite entree a une pluralite de neu- 
rones, ou chaque neurone de ladite pluralite de neurones comprend une limite definie par deux axes de neurones 

50 ou plus de differentes longueurs, caracterise en ce que 

ladite information representative de ladite entree renferme un vecteur a fonctions, ledit moyen de comparaison 
selectionne chaque neurone qui englobe ledit vecteur a fonctions, ledit moyen de comparaison determine les 
mesures de distance allant dudit vecteur a fonctions a chacun desdits neurones, ledit moyen de selection selec- 
tionne un neurone parmi ladite pluralite de neurones ayant la plus petite mesure de distance parmi lesdites mesures 

55 de distance et ledit moyen de selection selectionne une sortie possible parmi ladite pluralite de sorties possibles 

qui est associee au dit neurone selectionne comme correspondant a ladite entree, ledit moyen de selection de- 
termine un premier nombre qui est une fonction du nombre desdits neurones selectionnes qui sont associes a 
une premiere sortie possible parmi ladite pluralite de sorties possibles, ledit moyen de selection determine un 
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deuxieme nombre qui est unefonction du nombre desdits neurones selectionnes qui sont associes a une deuxieme 
sortie possible parmi ladite pluralite de sorties possibles, et si ledit premier nombre est superieur au dit deuxieme 
nombre, alors ledit moyen de selection determine que ladite entree ne correspond pas a ladite deuxieme sortie 
possible, ou bien si ledit deuxieme nombre est superieur au dit premier nombre, alors ledit moyen de selection 
5 determine que ladite entree ne correspond pas a ladite premiere sortie possible. 

8. Dispositif de classification selon la revendication 7, oil ledit neurone est dans un espace des fonctions comprenant 
deux axes ou plus de I'espace des fonctions, ou au moins un desdits axes de neurone est parallele a I'un desdits 
axes de I'espace des fonctions. 

io 

9. Dispositif de classification selon la revendication 7, oD ledit neurone est dans un espace des fonctions comprenant 
deux axes ou plus de I'espace des fonctions, oil au moins un desdits axes de neurone est parallele a aucun desdits 
axes de I'espace des fonctions. 

15 - 10. Dispositif de classification selon la revendication 7, ou ledit premier nombre est egal au nombre desdits neurones 
selectionnes qui sont associes a ladite premiere sortie possible, et ledit deuxieme nombre est egal au nombre 
desdits neurones selectionnes qui sont associes a ladite deuxieme sortie possible. 

11. Dispositif selon la revendication 7, ou ledit vecteur a fonctions comprend un element, ou ledit element est un 
20 element de Grid ou un element de Hadam'ard. 

12. Dispositif selon la revendication 7, ou ledit neurone est une hyper-ellipse ou un hyper-rectarigle dans I'espace des 
fonctions k- dimensionnels, ou k est superieur ou egal a deux. 



35 



40 



45 



EP 0 574 936 B1 




19 



EP 0 574 936 B1 




EP 0 574 936 B1 



3oa 



Op4,t,*Uy Alfuirto/ CAorezkn 
•Is' - 



3CS 



304 





















Victor V* 

















































3co 



3 



character 




21 



EP 0 574 936 B1 




EP 0 574 936 B1 




23 



EP 0 574 936 B1 




EP 0 574 936 B1 



OPTICALLY ACQUIRED 
CHARACTER BITMAP 




OUTPUT CHARACTERS 



25 



